Lagrange policy gradient

نویسندگان

Bita Behrouzi

Douglas Tweed

چکیده

Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving linear mechanical systems quickly to a target configuration. On these tasks its performance is comparable to that of deep deterministic policy gradient, a recent action-value method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagr...

متن کامل

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented v...

متن کامل

The Linear Nonconvex Generalized Gradient and Lagrange Multipliers

A Lagrange multiplierrules that uses small generalized gradients is introduced. It includes both inequality and set constraints. The generalized gradient is the linear generalized gradient. It is smaller than the generalized gradients of Clarke and Mordukhovich but retains much of their nice calculus. Its convex hull is the generalized gradient of Michel and Penot if a function is Lipschitz. Th...

متن کامل

Lagrange Multipliers for Nonconvex Generalized Gradients with Equality, Inequality and Set Constraints

A Lagrange multiplier rule for nite dimensional Lipschitz problems is proven that uses a nonconvex generalized gradient. This result uses either both the linear generalized gradient and the generalized gradient of Mordukhovich or the linear generalized gradient and a qualiication condition involving the pseudo-Lipschitz behavior of the feasible set under perturbations. The optimization problem ...

متن کامل

A mu - differentiable Lagrange multiplier rule ∗

We present some properties of the gradient of a mu-differentiable function. The Method of Lagrange Multipliers for mu-differentiable functions is then exemplified.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1711.05817 شماره

صفحات -

تاریخ انتشار 2017

Lagrange policy gradient

نویسندگان

چکیده

منابع مشابه

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

The Linear Nonconvex Generalized Gradient and Lagrange Multipliers

Lagrange Multipliers for Nonconvex Generalized Gradients with Equality, Inequality and Set Constraints

A mu - differentiable Lagrange multiplier rule ∗

عنوان ژورنال:

اشتراک گذاری